Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset

نویسندگان

  • Niloofar Arshadi
  • Billy Chang
  • Rafal Kustra
چکیده

In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification in the context of multivariate predictive modeling. We address the problem by clustering the subjects on the axes of genetic variations, and building a predictive model separately in each cluster. This allows us to control ethnicity without explicitly including it in the model, which could marginalize the genetic signal we are trying to discover. Clustering not only leads to more similar ethnicity groups but also, as our results show, increases the accuracy of our model when compared to the non-clustered approach. The highest accuracy is achieved with the model adjusted for population stratification, when the genetic axes of variation are included among the set of predictors, although this may be misleading given the confounding effects.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Single-nucleotide polymorphism of rs11061971 (+219 A>T) in adiponectin receptor 2 (AdipoR2) gene and its association with risk of type 2 diabetes among an Iranian population

Background and Objectives: Genetic modifications in the adiponectin receptor 2 (AdipoR2) gene can affect phenotypes associated with insulin resistance and diabetes. The purpose of this study was to evaluate the possible role of genetic modifications in the AdipoR2 gene, to determine the frequency of genotypes and polymorphism alleles of this gene at rs11061971 (+219 A>T), and to investigate its...

متن کامل

Accommodating population stratification in case-control association analysis: a new test and its application to genome-wide study on rheumatoid arthritis

It is well known that conventional association tests can lead to excessive false positives when there is population stratification. We propose a new test for detecting genetic association with a case-control study design. Unlike some other methods for handling population stratification, we treat the cases as a population and the controls as another one even though each of them may be a mixture ...

متن کامل

Association study of two single nucleotide polymorphisms rs10757278 and rs1333049 with atherosclerosis, a case-control study from Iraq

Atherosclerosis is one of the most important coronary artery disease (CAD) caused by lipid accumulation, hypertension, smoking, and many other factors such as environmental and genetic factors. It has been recorded that genetic variations in rs10757278 and rs1333049 are correlated with CAD. In the present study, 100 blood samples were collected (50 CAD patients and 50 appeared to be healthy con...

متن کامل

Evaluation of the Association of Htr2a Gene Rs6313 Polymorphism with Heroin Dependence in a Sample from Northwest Iran

Introduction: Heroin dependence is a chronic relapsing disorder caused by a combination of genetic, epigenetic, and environmental factors. The genetic contribution in the vulnerability to heroin dependence is 40%-60%. Alterations in dopamine transport in the CNS are implicated in drug and alcohol dependence, and according to linkage studies, the HTR2A rs6313 single nucleotide polymorphism plays...

متن کامل

Genetic polymorphisms in the estrogen receptor - α Gene codon 325(CCC}CCG) and risk of breast cancer among Iranian women: a case control study

  Abstract   Background: The Iranian breast cancer patients are relatively younger than their   Western counterparts. Evidence suggests that alterations in estrogen signaling pathways , including estrogen receptor-α (ER- α ), occur during breast cancer development in Caucasians. Epidemiologic studies have revealed that age-incidence patterns of breast cancer in Asians differ from those in Cauca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2009